In this paper, we describe how information obtained from multiple views using\r\na network of cameras can be effectively combined to yield a reliable and fast human\r\nactivity recognition system. First, we present a score-based fusion technique for combining\r\ninformation from multiple cameras that can handle the arbitrary orientation of the subject\r\nwith respect to the cameras and that does not rely on a symmetric deployment of the\r\ncameras. Second, we describe how longer, variable duration, inter-leaved action sequences\r\ncan be recognized in real-time based on multi-camera data that is continuously streaming in.\r\nOur framework does not depend on any particular feature extraction technique, and as a\r\nresult, the proposed system can easily be integrated on top of existing implementations\r\nfor view-specific classifiers and feature descriptors. For implementation and testing of the\r\nproposed system, we have used computationally simple locality-specific motion information\r\nextracted from the spatio-temporal shape of a human silhouette as our feature descriptor.\r\nThis lends itself to an efficient distributed implementation, while maintaining a high frame\r\ncapture rate. We demonstrate the robustness of our algorithms by implementing them on\r\na portable multi-camera, video sensor network testbed and evaluating system performance\r\nunder different camera network configurations.
Loading....